I first read in the data:
I generate the age and gender vectors that will be used for modeling in all patients (donors, recipients and couples):
## COUPLENUMBER Gender DOG DOB GROUP
## R4147_D0 1 M 2015-07-23 1960-03-31 non_tolerant
## R4147_C1 1 F 2015-07-23 1962-02-02 non_tolerant
## R3574_D0 10 F 2015-04-10 1950-09-17 primary_tolerant
## R3574_S2 10 F 2015-04-10 1968-11-26 primary_tolerant
## R557_D0 11 F 2013-05-29 1982-01-16 primary_tolerant
## R557_S2 11 F 2013-05-29 1980-07-12 primary_tolerant
## gender_comp age_recip
## R4147_D0 MF 19529
## R4147_C1 MF 19529
## R3574_D0 FF 16936
## R3574_S2 FF 16936
## R557_D0 FF 12009
## R557_S2 FF 12009
I then filter out the metabolies that are xenobiotics drugs, as well as the metabolites which are not common to both the St Louis and the Cryostem cohort:
xeno <- which(meta_metabo[2,]=="Xenobiotics")
data_metabolites <- data_metabolites[,-xeno[which(which(meta_metabo[2,]=="Xenobiotics") %in%
grep("Drug", meta_metabo[3,]))]] # rm drug xenobiotics
data_metabo_local <- read.xlsx("~/Documents/VIB/Projects/Integrative_Paris/Local_cohort/Metabo/Metabolomic local cohort Saint-Louis_filtered.xlsx", rows = c(1:82), colNames = F, rowNames = T)
local_metabolites <- data_metabo_local[3,]
other_cohort_metabolites <- local_metabolites
I use my home made function to pre-process the metabolites as I did for the local cohort: - I filter out the metabolies that are not common to both the St Louis and the Cryostem cohort - I filter out the metabolites for which we have more than 50% of missing values in all our sub-groups (non-tolerant, tolerant 1 or tolerant 2 recipients) - I replace the missing values by 1/2 of the smallest value measured for each metabolite plus some noise - I filter out the metabolites that do not vary enough across patients - I finally log-transform, normalise and save the data
We can build a PCA on the recipients to see if we already observe some structure in the data, and if it isn’t batch affected:
## Warning: Removed 3 rows containing missing values (geom_point).
We don’t observe much structure in the data groups: they seem to be quite mixed. We don’t observe batch effects dur to the hospital neither. I also colored the recipients per CMV status, but there doesn’t seem to be a CMV positive group, as opposed to what was observed in the CyTOF data.
We can try to build a random forest model on all features of the recipients, to already roughly see if there is some information in the recipients data, before performing any feature selection :
##
## Call:
## randomForest(formula = Group ~ ., data = norm_data, mtry = 40)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 40
##
## OOB estimate of error rate: 56.25%
## Confusion matrix:
## non_tolerant primary_tolerant secondary_tolerant
## non_tolerant 4 6 4
## primary_tolerant 4 9 5
## secondary_tolerant 2 6 8
## class.error
## non_tolerant 0.7142857
## primary_tolerant 0.5000000
## secondary_tolerant 0.5000000
But the classification into three groups is quite poor. We can try classifying in two steps: First, we try to classify between non tolerant and tolerant patients:
##
## Call:
## randomForest(formula = group ~ ., data = df_tmp, mtry = mtry, ntree = ntree, importance = T, proximity = T)
## Type of random forest: classification
## Number of trees: 10000
## No. of variables tried at each split: 30
##
## OOB estimate of error rate: 27.08%
## Confusion matrix:
## non_tolerant tolerant class.error
## non_tolerant 4 10 0.71428571
## tolerant 3 31 0.08823529
Volcano plots between tolerant and non tolerant patients:
Second, we try to classify only primary and seconday tolerant recipients:
##
## Call:
## randomForest(formula = group ~ ., data = df_tol, mtry = mtry, ntree = ntree, importance = T, proximity = T)
## Type of random forest: classification
## Number of trees: 10000
## No. of variables tried at each split: 30
##
## OOB estimate of error rate: 50%
## Confusion matrix:
## primary_tolerant secondary_tolerant class.error
## primary_tolerant 10 8 0.4444444
## secondary_tolerant 9 7 0.5625000
Volcano plots between primary and secondary tolerant patients:
Since neither the first model (classifying between tolerant and non tolerant) nore the second model (classifying between primary and secondary tolerant recipients) seem to give results in the Cryostem cohort, we apply feature selection, to identify the metabolites that might be playing a role in these classification problems in a supervised way:
I first add the demographics info to the norm_data matrix:
And then compute the permutation distribution for the non vs tolerance:
Visualise the resulting quantile values: In the first plot: quantile values of the models on metabolites one by one In the second plot: quantile values of the difference between group~ age+gender and group~ age+gender+metabolite
Compute the median per group for every gene, to use it later in the graphs:
# I first rearrange the nor data matrix (so that I don't have to add +1 to select features)
norm_data <- norm_data %>%
rownames_to_column("rn") %>%
select(-group, everything()) %>%
column_to_rownames("rn")
## Note: zip::zip() is deprecated, please use zip::zipr() instead
## [1] "93 metabolites were selected with a 0.90 threshold in the St Louis cohort."
We can see which of these metabolites were also selected in the St Louis cohort to separate tolerant from non tolerant recipients:
## [1] "44 features were commonly found in both cohorts"
We can see how many of these features had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of patients)
## [1] "42 features out of the 44 common selected features have the same behaviour in both cohorts."
## [1] "X1..1.enyl.palmitoyl..2.arachidonoyl.GPC..P.16.0.20.4.."
## [2] "X1.arachidonoyl.GPE..20.4n6.."
## [3] "X1.linoleoyl.2.arachidonoyl.GPC..18.2.20.4n6.."
## [4] "X1.palmitoyl.2.gamma.linolenoyl.GPC..16.0.18.3n6.."
## [5] "X1.palmitoylglycerol..16.0."
## [6] "X1.stearoyl.2.arachidonoyl.GPC..18.0.20.4."
## [7] "X1.stearoyl.2.arachidonoyl.GPI..18.0.20.4."
## [8] "X3.methyl.2.oxovalerate"
## [9] "X4.methyl.2.oxopentanoate"
## [10] "alanine"
## [11] "androstenediol..3beta.17beta..monosulfate..1."
## [12] "androsterone.sulfate"
## [13] "asparagine"
## [14] "beta.alanine"
## [15] "choline.phosphate"
## [16] "cysteine.glutathione.disulfide"
## [17] "dehydroisoandrosterone.sulfate..DHEA.S."
## [18] "epiandrosterone.sulfate"
## [19] "gamma.glutamylalanine"
## [20] "gamma.glutamylglutamine"
## [21] "glucuronate"
## [22] "histidine"
## [23] "hydroxycotinine"
## [24] "indolelactate"
## [25] "isoleucine"
## [26] "leucine"
## [27] "lysine"
## [28] "methionine"
## [29] "methyl.4.hydroxybenzoate.sulfate"
## [30] "N.palmitoyl.heptadecasphingosine..d17.1.16.0.."
## [31] "ornithine"
## [32] "oxalate..ethanedioate."
## [33] "proline"
## [34] "serine"
## [35] "sphingosine.1.phosphate"
## [36] "theobromine"
## [37] "threonate"
## [38] "threonine"
## [39] "trans.4.hydroxyproline"
## [40] "tryptophan"
## [41] "tyrosine"
## [42] "valine"
Graph of the selected metabolites that are common between the two cohorts: The features that were correlated (ie expressed in similar ways among the recipients) are linked in the graph. The features that were more expressed in the non tolerant St Louis recipients are colored in red, while the features that were more expressed in tolerant recipients are colored in blue.
I save the common features in an excel file: The excel file contains information on which features were correlated in the graph above. It also contains, for every gene, the associated logFC, pvalue…, where the comparison is in non tolerant versus tolerant patients (eg, a positive Fold Change for a gene therefore means that the gene is overexpressed in non tolrant patients)
We can try to build a RF model based on these common features:
##
## Call:
## randomForest(formula = group ~ ., data = df_tmp, mtry = mtry, ntree = ntree, importance = T, proximity = T)
## Type of random forest: classification
## Number of trees: 1000
## No. of variables tried at each split: 7
##
## OOB estimate of error rate: 22.92%
## Confusion matrix:
## non_tolerant tolerant class.error
## non_tolerant 5 9 0.64285714
## tolerant 2 32 0.05882353
The classification of the Cryostem recipients into tolerant and non tolerant groups has not really improved after feature selection.
Now that we have selected metabolites that seemed to play a role in tolerance that were common to both cohorts (with a threhold over 0.90), we decide to investigate into these metabolites to see if some of them are related to the age of the recipients, to the gender compatibility between the donor and recipient, to both or to none of these two causes.
I first build new models using the metabolites that were kept in the two cohorts as informative when comparing tolerant and non tolerant patients:
new_models <- lapply(seq_along(common_features$common_features), function(idx){
dta_tmp <- norm_data[,c(common_features$common_features[idx],
"group", "age_recip", "gender_comp")]
colnames(dta_tmp)[1] <- "Var"
fit1 <- glm(group ~ Var, data=dta_tmp,trace=F, family = "binomial")
fit2 <- glm(group ~ gender_comp + Var,
data=dta_tmp,trace=F, family = "binomial")
fit3 <- glm(group ~ age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
fit4 <- glm(group ~ gender_comp + age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
list(fit1, fit2, fit3, fit4)
})
plots_gender_age(pdf_name = "../plots/metabo_national/recip/age_and_gender/tol_nontol_0.9.pdf",
norm_data = norm_data,
features = common_features$common_features)
## quartz_off_screen
## 2
plots_gender_age(pdf_name = "../plots/metabo_national/recip/age_and_gender/tol_nontol_0.9_behavior.pdf",
norm_data = norm_data,
features = common_beh_features)
## quartz_off_screen
## 2
And then compute the permutation distribution for the non vs tolerance:
norm_data <- norm_data_3gr
norm_data <- norm_data[which(norm_data$group != "non_tolerant"),]
norm_data$group <- as.factor(norm_data$group)
# Calculate the perm_values on PRISM:
perm_vals_PRISM(norm_data, PRISM_name = "metabo_recip_prim_sec")
# Fetch the data back from PRISM
handle2 <- readRDS("handle2.rds")
perm_vals_PS<-qsub_retrieve(handle2)
save(perm_vals_PS, file = "~/Documents/VIB/Projects/Integrative_Paris/Integrative/outputs/data/metabo_national/recip/perm_vals_PS.RData")
save(norm_data, file = "~/Documents/VIB/Projects/Integrative_Paris/Integrative/outputs/data/metabo_national/recip/norm_data_PS.RData")
Visualise the resulting quantile values: In the first plot: quantile values of the models on metabolites one by one In the second plot: quantile values of the difference between group~ age+gender and group~ age+gender+metabolite
Compute the median per group for every gene, to use it later in the graphs:
Selected metabolites for the recipients (with a threshold of 0.90), after computing permutation distributions for every metabolite:
# I first rearrange the nor data matrix (so that I don't have to add +1 to select features)
norm_data <- norm_data %>%
rownames_to_column("rn") %>%
select(-group, everything()) %>%
column_to_rownames("rn")
selected_ft_recip_qt <- select_features(perm_vals = perm_vals,
norm_data = norm_data,
threshold = 0.90,
file_path = recip_path,
file_name = "perm_val_PTST_90_thresh.xlsx")
print(paste0(length(selected_ft_recip_qt),
" metabolites were selected with a 0.90 threshold in the Cryostem cohort."))
## [1] "61 metabolites were selected with a 0.90 threshold in the Cryostem cohort."
We can see which of these metabolites were also selected in the St Louis cohort to separate tolerant from non tolerant recipients:
## [1] "3 features were commonly found in both cohorts"
We can see how many of these features had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of patients)
## [1] "2 features out of the 3 common selected features have the same behaviour in both cohorts."
## [1] "X2.palmitoyl.GPC..16.0.." "X3.aminoisobutyrate"
Graph of the selected metabolites that are common between the two cohorts: The features that were correlated (ie expressed in similar ways among the recipients) are linked in the graph. The features that were more expressed in the non tolerant St Louis recipients are colored in red, while the features that were more expressed in tolerant recipients are colored in blue.
I save the common features in an excel file: The excel file contains information on which features were correlated in the graph above. It also contains, for every gene, the associated logFC, pvalue…, where the comparison is in non tolerant versus tolerant patients (eg, a positive Fold Change for a gene therefore means that the gene is overexpressed in non tolrant patients)
Boxplots of these common metabolites:
Now that we have selected metabolites that seemed to play a role in tolerance that were common to both cohorts (with a threhold over 0.90), we decide to investigate into these metabolites to see if some of them are related to the age of the recipients, to the gender compatibility between the donor and recipient, to both or to none of these two causes.
I first build new models using the metabolites that were kept in the two cohorts as informative when comparing primary and secondary tolerant patients:
new_models <- lapply(seq_along(common_features$common_features), function(idx){
dta_tmp <- norm_data[,c(common_features$common_features[idx],
"group", "age_recip", "gender_comp")]
colnames(dta_tmp)[1] <- "Var"
fit1 <- glm(group ~ Var, data=dta_tmp,trace=F, family = "binomial")
fit2 <- glm(group ~ gender_comp + Var,
data=dta_tmp,trace=F, family = "binomial")
fit3 <- glm(group ~ age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
fit4 <- glm(group ~ gender_comp + age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
list(fit1, fit2, fit3, fit4)
})
plots_gender_age(pdf_name = "../plots/metabo_national/recip/age_and_gender/prim_sec_0.9.pdf",
norm_data = norm_data,
features = common_features$common_features)
## quartz_off_screen
## 2
plots_gender_age(pdf_name = "../plots/metabo_national/recip/age_and_gender/prim_sec_0.9_behavior.pdf",
norm_data = norm_data,
features = common_beh_features)
## quartz_off_screen
## 2
I use my home made function to pre-process the metabolites as I did for the local cohort: - I filter out the metabolies that are not common to both the St Louis and the Cryostem cohort - I filter out the metabolites for which we have more than 50% of missing values in all our sub-groups (non-tolerant, tolerant 1 or tolerant 2 recipients) - I replace the missing values by 1/2 of the smallest value measured for each metabolite plus some noise - I filter out the metabolites that do not vary enough across patients - I finally log-transform, normalise and save the data
We can build a PCA on the recipients to see if we already observe some structure in the data, and if it isn’t batch affected:
## Warning: Removed 2 rows containing missing values (geom_point).
We don’t observe much structure in the data groups: they seem to be quite mixed. We don’t observe batch effects due to the hospital neither. I also colored the recipients per CMV status, but there doesn’t seem to be a CMV positive group, as opposed to what was observed in the CyTOF data.
We can try to build random forest models on the donors before applying any feature selection, but it seems like the metabolomics data isn’t sufficient to classify donors leading to tolerance or non tolerance:
##
## Call:
## randomForest(formula = Group ~ ., data = norm_data, mtry = 40)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 40
##
## OOB estimate of error rate: 83.33%
## Confusion matrix:
## non_tolerant primary_tolerant secondary_tolerant
## non_tolerant 0 4 10
## primary_tolerant 2 8 8
## secondary_tolerant 6 10 0
## class.error
## non_tolerant 1.0000000
## primary_tolerant 0.5555556
## secondary_tolerant 1.0000000
Trying to classify between non tolerant and tolerant patients:
##
## Call:
## randomForest(formula = group ~ ., data = df_tmp, mtry = mtry, ntree = ntree, importance = T, proximity = T)
## Type of random forest: classification
## Number of trees: 10000
## No. of variables tried at each split: 30
##
## OOB estimate of error rate: 31.25%
## Confusion matrix:
## non_tolerant tolerant class.error
## non_tolerant 0 14 1.00000000
## tolerant 1 33 0.02941176
Trying to classify only primary and secondary tolerant donors:
##
## Call:
## randomForest(formula = group ~ ., data = df_tol, mtry = mtry, ntree = ntree, importance = T, proximity = T)
## Type of random forest: classification
## Number of trees: 10000
## No. of variables tried at each split: 30
##
## OOB estimate of error rate: 67.65%
## Confusion matrix:
## primary_tolerant secondary_tolerant class.error
## primary_tolerant 8 10 0.5555556
## secondary_tolerant 13 3 0.8125000
Run the permutation distribution, taking demographic variables into account: I first add the demographics info to the norm_data:
And then compute the permutation distribution:
norm_data$group <- as.character(norm_data$group)
norm_data$group[which(norm_data$group!="non_tolerant")] <- "tolerant"
norm_data$group <- as.factor(norm_data$group)
# Calculate the number of cores
no_cores <- detectCores() - 1
# Initiate cluster
cl<-makeCluster(no_cores)
clusterExport(cl, c("norm_data", "perm_val_LR_demo"))
Sys.time()
perm_vals <- parLapply(cl, seq_along(colnames(norm_data)[-which(colnames(norm_data) %in% c("group", "age_recip", "gender_comp"))]), function(feat){
perm_val_LR_demo(data = norm_data, nb_perm = 1000, ind_feature = feat+1)
})
stopCluster(cl)
Sys.time()
save(perm_vals, file = "~/Documents/VIB/Projects/Integrative_Paris/Integrative/outputs/data/metabo_national/donors/perm_vals_TNT.RData")
save(norm_data, file= "~/Documents/VIB/Projects/Integrative_Paris/Integrative/outputs/data/metabo_national/donors/norm_data_TNT.RData")
Compute the median per group for every metabolite, to use it later in the graphs:
# I first rearrange the nor data matrix (so that I don't have to add +1 to select features)
norm_data <- norm_data %>%
rownames_to_column("rn") %>%
select(-group, everything()) %>%
column_to_rownames("rn")
Selected metabolites for the donors, after computing permutation distributions for every metabolite:
## [1] "55 metabolites were selected with a 0.90 threshold in the Cryostem cohort."
We can see which of these metabolites were also selected in the St Louis cohort to separate tolerant from non tolerant recipients:
## [1] "12 features were commonly found in both cohorts"
We can see how many of these features had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of patients)
## [1] "6 features out of the 12 common selected features have the same behaviour in both cohorts."
## [1] "X5alpha.pregnan.3beta.20alpha.diol.disulfate"
## [2] "androstenediol..3beta.17beta..disulfate..2."
## [3] "androsterone.sulfate"
## [4] "beta.alanine"
## [5] "lysine"
## [6] "taurocholenate.sulfate"
Graph of the selected metabolites that are common between the two cohorts: The features that were correlated (ie expressed in similar ways among the recipients) are linked in the graph. The features that were more expressed in the non tolerant St Louis recipients are colored in red, while the features that were more expressed in tolerant recipients are colored in blue.
I save the common features in an excel file: The excel file contains information on which features were correlated in the graph above. It also contains, for every gene, the associated logFC, pvalue…, where the comparison is in non tolerant versus tolerant patients (eg, a positive Fold Change for a gene therefore means that the gene is overexpressed in non tolrant patients)
We can try to build a RF model based on these common features:
##
## Call:
## randomForest(formula = group ~ ., data = df_tmp, mtry = mtry, ntree = ntree, importance = T, proximity = T)
## Type of random forest: classification
## Number of trees: 1000
## No. of variables tried at each split: 3
##
## OOB estimate of error rate: 22.92%
## Confusion matrix:
## non_tolerant tolerant class.error
## non_tolerant 7 7 0.5000000
## tolerant 4 30 0.1176471
But the classification is bad.
Now that we have selected metabolites that seemed to play a role in tolerance that were common to both cohorts (with a threhold over 0.90), we decide to investigate into these metabolites to see if some of them are related to the age of the recipients, to the gender compatibility between the donor and recipient, to both or to none of these two causes.
I first build new models using the metabolites that were kept in the two cohorts as informative when comparing tolerant and non tolerant patients:
new_models <- lapply(seq_along(common_features$common_features), function(idx){
dta_tmp <- norm_data[,c(common_features$common_features[idx],
"group", "age_recip", "gender_comp")]
colnames(dta_tmp)[1] <- "Var"
fit1 <- glm(group ~ Var, data=dta_tmp,trace=F, family = "binomial")
fit2 <- glm(group ~ gender_comp + Var,
data=dta_tmp,trace=F, family = "binomial")
fit3 <- glm(group ~ age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
fit4 <- glm(group ~ gender_comp + age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
list(fit1, fit2, fit3, fit4)
})
plots_gender_age(pdf_name = "../plots/metabo_national/donors/age_and_gender/tol_nontol_0.9.pdf",
norm_data = norm_data,
features = common_features$common_features)
## quartz_off_screen
## 2
plots_gender_age(pdf_name = "../plots/metabo_national/donors/age_and_gender/tol_nontol_0.9_behavior.pdf",
norm_data = norm_data,
features = common_beh_features)
## quartz_off_screen
## 2
Run the permutation distribution, taking demographic variables into account:
norm_data <- norm_data_3gr
norm_data <- norm_data[which(norm_data$group != "non_tolerant"),]
norm_data$group <- as.factor(as.character(norm_data$group))
# Calculate the perm_values on PRISM:
perm_vals_PRISM(norm_data, PRISM_name = "metabo_donors_prim_sec")
# Fetch the data back from PRISM
handle2 <- readRDS("../../handle_cryo_donors_PS.rds")
perm_vals<-qsub_retrieve(handle2)
save(perm_vals, file = "~/Documents/VIB/Projects/Integrative_Paris/Integrative/outputs/data/metabo_national/donors/perm_vals_PS.RData")
save(norm_data, file= "~/Documents/VIB/Projects/Integrative_Paris/Integrative/outputs/data/metabo_national/donors/norm_data_PS.RData")
Compute the median per group for every metabolite, to use it later in the graphs:
Compute the median per group for every gene, to use it later in the graphs:
gr_medians <- norm_data %>%
group_by(group) %>%
summarize_if(is.numeric, median) %>%
as.data.frame()
rownames(gr_medians) <- gr_medians$group
# I first rearrange the nor data matrix (so that I don't have to add +1 to select features)
norm_data <- norm_data %>%
rownames_to_column("rn") %>%
select(-group, everything()) %>%
column_to_rownames("rn")
Selected metabolites for the donors, after computing permutation distributions for every metabolite:
## [1] "38 metabolites were selected with a 0.90 threshold in the St Louis cohort."
We can see which of these metabolites were also selected in the St Louis cohort to separate tolerant from non tolerant recipients:
## [1] "3 features were commonly found in both cohorts"
We can see how many of these features had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of patients)
## [1] "1 features out of the 3 common selected features have the same behaviour in both cohorts."
## [1] "palmitoyl.dihydrosphingomyelin..d18.0.16.0.."
Graph of the selected metabolites that are common between the two cohorts: The features that were correlated (ie expressed in similar ways among the recipients) are linked in the graph. The features that were more expressed in the non tolerant St Louis recipients are colored in red, while the features that were more expressed in tolerant recipients are colored in blue.
I save the common features in an excel file: The excel file contains information on which features were correlated in the graph above. It also contains, for every gene, the associated logFC, pvalue…, where the comparison is in non tolerant versus tolerant patients (eg, a positive Fold Change for a gene therefore means that the gene is overexpressed in non tolrant patients)
Now that we have selected metabolites that seemed to play a role in tolerance that were common to both cohorts (with a threhold over 0.90), we decide to investigate into these metabolites to see if some of them are related to the age of the recipients, to the gender compatibility between the donor and recipient, to both or to none of these two causes.
I first build new models using the metabolites that were kept in the two cohorts as informative when comparing tolerant and non tolerant patients:
new_models <- lapply(seq_along(common_features$common_features), function(idx){
dta_tmp <- norm_data[,c(common_features$common_features[idx],
"group", "age_recip", "gender_comp")]
colnames(dta_tmp)[1] <- "Var"
fit1 <- glm(group ~ Var, data=dta_tmp,trace=F, family = "binomial")
fit2 <- glm(group ~ gender_comp + Var,
data=dta_tmp,trace=F, family = "binomial")
fit3 <- glm(group ~ age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
fit4 <- glm(group ~ gender_comp + age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
list(fit1, fit2, fit3, fit4)
})
plots_gender_age(pdf_name = "../plots/metabo_national/donors/age_and_gender/prim_sec_0.9.pdf",
norm_data = norm_data,
features = common_features$common_features)
## quartz_off_screen
## 2
plots_gender_age(pdf_name = "../plots/metabo_national//donors/age_and_gender/prim_sec_0.9_behavior.pdf",
norm_data = norm_data,
features = common_beh_features)
## quartz_off_screen
## 2
load("~/Documents/VIB/Projects/Integrative_Paris/Integrative/outputs/data/metabo_national/r&d_minus3patients/big_mat.RData")
load("~/Documents/VIB/Projects/Integrative_Paris/Integrative/outputs/data/metabo_national/r&d_minus3patients/rd_meta.RData")
load("~/Documents/VIB/Projects/Integrative_Paris/Integrative/outputs/data/metabo_national/r&d_minus3patients/norm_data.RData")
norm_data$Group <- as.factor(norm_data$Group)
tokeep <- which(colnames(rd_meta) %in% c("GROUP", "METABONAME", "COUPLENUMBER"))
vars2rm <- colnames(rd_meta)[-c(tokeep)]
big_mat2 <- big_mat %>%
group_by(COUPLENUMBER) %>%
dplyr::select(-vars2rm)
subst <- big_mat2 %>%
summarise_if(is.numeric, ~.[1]-.[2])
rd_meta_subst <- rd_meta[which(rd_meta$COUPLENUMBER %in% subst$COUPLENUMBER),] %>%
arrange(COUPLENUMBER) %>%
dplyr::select(c("GROUP", "COUPLENUMBER")) %>%
unique()
subst <- subst %>%
mutate("couple" = paste0("couple_",COUPLENUMBER),
"group" = rd_meta_subst$GROUP) %>%
column_to_rownames("couple") %>%
dplyr::select (-"COUPLENUMBER")
subst <- subst[c(ncol(subst), 1:(ncol(subst)-1))]
colnames(subst) <- make.names(colnames(subst))
colnames(subst)[-1] <- paste0("subst_", colnames(subst))[-1]
We can apply a principal components analysis to see if there is some structure in the data that can already be seen:
In the following PCA plot, we plot the recipients as black dots, the donors as white dots, and we link donors and recipients that belong to same a couple by a line of the color of their group.
It looks like, in the metabolomics data of the Cryostem cohort, the distances between non tolerant donors and recipients are not really different from the distances between primary or secondary tolerant donors and recipients. This is similar to what we had observed in the CyTOF data of these same patients.
We can already try to classify the couples based on all the metabolites:
##
## Call:
## randomForest(formula = group ~ ., data = subst, mtry = 40)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 40
##
## OOB estimate of error rate: 41.67%
## Confusion matrix:
## non_tolerant primary_tolerant secondary_tolerant
## non_tolerant 9 3 2
## primary_tolerant 2 11 5
## secondary_tolerant 1 7 8
## class.error
## non_tolerant 0.3571429
## primary_tolerant 0.3888889
## secondary_tolerant 0.5000000
Trying to classify between non tolerant and tolerant patients:
##
## Call:
## randomForest(formula = group ~ ., data = df_tmp, mtry = mtry, ntree = ntree, importance = T, proximity = T)
## Type of random forest: classification
## Number of trees: 10000
## No. of variables tried at each split: 30
##
## OOB estimate of error rate: 22.92%
## Confusion matrix:
## non_tolerant tolerant class.error
## non_tolerant 4 10 0.71428571
## tolerant 1 33 0.02941176
Trying to classify only primary and seconday tolerant recipients:
##
## Call:
## randomForest(formula = group ~ ., data = df_tol, mtry = mtry, ntree = ntree, importance = T, proximity = T)
## Type of random forest: classification
## Number of trees: 10000
## No. of variables tried at each split: 30
##
## OOB estimate of error rate: 44.12%
## Confusion matrix:
## primary_tolerant secondary_tolerant class.error
## primary_tolerant 12 6 0.3333333
## secondary_tolerant 9 7 0.5625000
But it seems like the metabolomics data might not hold enough information to classify the couples into the difference tolerance groups. We try applying feature selection to select the features that seem to be playing a role in tolerance:
I first add the demographics info to the couple matrix:
I then compute the permutation distribution for non vs tolerant recipients.
Compute the median per group for every gene, to use it later in the graphs:
Selected metabolites for the couples, with a .95 threshold:
## [1] "88 metabolites were selected with a 0.95 threshold in the Cryostem cohort."
Which of these metabolites are common to the St louis cohort?
## [1] 17
## [1] "subst_X1.5.anhydroglucitol..1.5.AG."
## [2] "subst_X1.palmitoleoylglycerol..16.1.."
## [3] "subst_X1.palmitoyl.2.oleoyl.GPE..16.0.18.1."
## [4] "subst_X3.methyl.2.oxovalerate"
## [5] "subst_X4.methyl.2.oxopentanoate"
## [6] "subst_X5alpha.pregnan.3beta.20alpha.diol.monosulfate..2."
## [7] "subst_aspartate"
## [8] "subst_cysteine.glutathione.disulfide"
## [9] "subst_isoleucine"
## [10] "subst_leucine"
## [11] "subst_N.palmitoyl.sphinganine..d18.0.16.0."
## [12] "subst_stearoyl.arachidonoyl.glycerol..18.0.20.4...1.."
## [13] "subst_taurochenodeoxycholate"
## [14] "subst_taurocholate"
## [15] "subst_tryptophan"
## [16] "subst_uridine"
## [17] "subst_ximenoylcarnitine..C26.1.."
Visualising how these features change at the couple level: I extract information of the expression of these common features in the recipients, and in the couples (donor value - recipient value), in the two cohorts. In the following figures, the amount of a certain metabolite in the patients is represented. The recipients values are represented on the x-axis, such that recipients in whom the metabolite was very present are situated more to the right. The donor-recipient values are represented on the y-axis, such that couples in which the metabolite was more present in the donors than in the recipients are situated more to the top.
For each feature, the cryostem patients are on the left and the St Louis patients are on the right, for comparison between the two cohorts.
I save these common metabolites:
We can see how many of these features had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of patients)
## [1] "1 metabolites out of the 17 common selected metabolites have the same behaviour in both cohorts."
## [1] "subst_stearoyl.arachidonoyl.glycerol..18.0.20.4...1.."
We can try to build a RF model based on the common features:
##
## Call:
## randomForest(formula = group ~ ., data = df_tmp, mtry = mtry, ntree = ntree, importance = T, proximity = T)
## Type of random forest: classification
## Number of trees: 10000
## No. of variables tried at each split: 4
##
## OOB estimate of error rate: 20.83%
## Confusion matrix:
## non_tolerant tolerant class.error
## non_tolerant 7 7 0.50000000
## tolerant 3 31 0.08823529
The classification of the Cryostem recipients into tolerant and non tolerant groups has not really improved after feature selection.
Selected metabolites for the couples, with a .90 threshold:
## [1] "136 metabolites were selected with a 0.90 threshold in the Cryostem cohort."
Which of these metabolites are common to the St louis cohort?
## [1] 32
## [1] "subst_X1.2.dilinoleoyl.GPC..18.2.18.2."
## [2] "subst_X1.5.anhydroglucitol..1.5.AG."
## [3] "subst_X1..1.enyl.palmitoyl..2.oleoyl.GPC..P.16.0.18.1.."
## [4] "subst_X1.palmitoleoylglycerol..16.1.."
## [5] "subst_X1.palmitoyl.2.oleoyl.GPE..16.0.18.1."
## [6] "subst_X1.stearoyl.2.linoleoyl.GPE..18.0.18.2.."
## [7] "subst_X2.aminobutyrate"
## [8] "subst_X2.linoleoylglycerol..18.2."
## [9] "subst_X3.methyl.2.oxovalerate"
## [10] "subst_X4.methyl.2.oxopentanoate"
## [11] "subst_X5alpha.pregnan.3beta.20alpha.diol.monosulfate..2."
## [12] "subst_aspartate"
## [13] "subst_carnitine"
## [14] "subst_cholesterol"
## [15] "subst_cysteine.glutathione.disulfide"
## [16] "subst_EDTA"
## [17] "subst_ergothioneine"
## [18] "subst_glucuronate"
## [19] "subst_glycochenodeoxycholate"
## [20] "subst_glycocholate"
## [21] "subst_isoleucine"
## [22] "subst_leucine"
## [23] "subst_N.palmitoyl.sphinganine..d18.0.16.0."
## [24] "subst_nicotinamide"
## [25] "subst_ornithine"
## [26] "subst_proline"
## [27] "subst_stearoyl.arachidonoyl.glycerol..18.0.20.4...1.."
## [28] "subst_taurochenodeoxycholate"
## [29] "subst_taurocholate"
## [30] "subst_tryptophan"
## [31] "subst_uridine"
## [32] "subst_ximenoylcarnitine..C26.1.."
Save these common features:
Plot the graph of these selected metabolites colored based on the groups where the metabolites are most expressed:
We can see how many of these features had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of patients)
## [1] "3 metabolites out of the 32 common selected metabolites have the same behaviour in both cohorts."
## [1] "subst_X1.2.dilinoleoyl.GPC..18.2.18.2."
## [2] "subst_X1..1.enyl.palmitoyl..2.oleoyl.GPC..P.16.0.18.1.."
## [3] "subst_stearoyl.arachidonoyl.glycerol..18.0.20.4...1.."
Now that we have identified features that are informative in both cohorts to classify tolerant from non tolerant couples, we can try to use them to build a model to classify the couples of the Cryostem cohort:
## predicted
## true non_tolerant tolerant
## non_tolerant 7 7
## tolerant 2 32
## [1] "prediction error: 18.75"
But classifying the non tolerant couples as such is still not feasible.
Now that we have selected features that seemed to play a role in tolerance that were common to both cohorts (with a threhold over 0.90), we decide to investigate into these features to see if some of them are related to the age of the recipients, to the gender compatibility between the donor and recipient, to both or to none of these two causes.
I first build new models using the features that were kept in the two cohorts as informative when comparing tolerant and non tolerant patients:
new_models <- lapply(seq_along(make.names(sel_com_names)), function(idx){
dta_tmp <- norm_data[,c(make.names(sel_com_names)[idx],
"group", "age_recip", "gender_comp")]
colnames(dta_tmp)[1] <- "Var"
fit1 <- glm(group ~ Var, data=dta_tmp,trace=F, family = "binomial")
fit2 <- glm(group ~ gender_comp + Var,
data=dta_tmp,trace=F, family = "binomial")
fit3 <- glm(group ~ age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
fit4 <- glm(group ~ gender_comp + age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
list(fit1, fit2, fit3, fit4)
})
plots_gender_age(pdf_name = "../plots/metabo_national/r&d/age_and_gender/tol_nontol_0.9.pdf",
norm_data = norm_data,
features = make.names(sel_com_names))
## quartz_off_screen
## 2
I compute the permutation distribution for primary vs secondary tolerant donors:
Visualise the resulting quantile values: In the first plot: quantile values of the models on metabolites one by one In the second plot: quantile values of the difference between group~ age+gender and group~ age+gender+metabolite
Compute the median per group for every gene, to use it later in the graphs:
Selected metabolites for the couples, with a .95 threshold:
## [1] "37 metabolites were selected with a 0.95 threshold in the Cryostem cohort."
Which of these metabolites are common to the St louis cohort?
## [1] 1
## [1] "subst_X2.palmitoyl.GPC..16.0.."
We can see how many of these features had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of patients)
## [1] "1 metabolites out of the 1 common selected metabolites have the same behaviour in both cohorts."
## [1] "subst_X2.palmitoyl.GPC..16.0.."
Selected metabolites for the couples, with a .95 threshold:
## [1] "63 metabolites were selected with a 0.90 threshold in the Cryostem cohort."
Which of these metabolites are common to the St louis cohort?
## [1] 8
## [1] "subst_X1.oleoyl.2.docosahexaenoyl.GPC..18.1.22.6.."
## [2] "subst_X1.palmitoleoyl.2.linoleoyl.GPC..16.1.18.2.."
## [3] "subst_X2.palmitoyl.GPC..16.0.."
## [4] "subst_dimethylarginine..SDMA...ADMA."
## [5] "subst_N1.Methyl.4.pyridone.3.carboxamide"
## [6] "subst_sphingosine"
## [7] "subst_sphingosine.1.phosphate"
## [8] "subst_succinate"
Save these common features:
Visualising how these features change at the couple level: I extract information of the expression of these common features in the recipients, and in the couples (donor value - recipient value), in the two cohorts. In the following figures, the amount of a certain metabolite in the patients is represented. The recipients values are represented on the x-axis, such that recipients in whom the metabolite was very present are situated more to the right. The donor-recipient values are represented on the y-axis, such that couples in which the metabolite was more present in the donors than in the recipients are situated more to the top.
For each feature, the cryostem patients are on the left and the St Louis patients are on the right, for comparison between the two cohorts.
We can see how many of these features had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of patients)
## [1] "5 metabolites out of the 8 common selected metabolites have the same behaviour in both cohorts."
## [1] "subst_X1.oleoyl.2.docosahexaenoyl.GPC..18.1.22.6.."
## [2] "subst_X1.palmitoleoyl.2.linoleoyl.GPC..16.1.18.2.."
## [3] "subst_X2.palmitoyl.GPC..16.0.."
## [4] "subst_sphingosine"
## [5] "subst_sphingosine.1.phosphate"
We can try to build a RF model based on these common features:
## predicted
## true primary_tolerant secondary_tolerant
## primary_tolerant 13 5
## secondary_tolerant 8 8
## [1] "prediction error: 38.2352941176471"
The classification of the Cryostem couples into tolerant and non tolerant groups has not really improved after feature selection.
Now that we have selected features that seemed to play a role in tolerance that were common to both cohorts (with a threhold over 0.90), we decide to investigate into these features to see if some of them are related to the age of the recipients, to the gender compatibility between the donor and recipient, to both or to none of these two causes.
I first build new models using the features that were kept in the two cohorts as informative when comparing tolerant and non tolerant patients:
new_models <- lapply(seq_along(selected_metabolites_dr_qt[sel_common]), function(idx){
dta_tmp <- norm_data[,c(selected_metabolites_dr_qt[sel_common][idx],
"group", "age_recip", "gender_comp")]
colnames(dta_tmp)[1] <- "Var"
fit1 <- glm(group ~ Var, data=dta_tmp,trace=F, family = "binomial")
fit2 <- glm(group ~ gender_comp + Var,
data=dta_tmp,trace=F, family = "binomial")
fit3 <- glm(group ~ age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
fit4 <- glm(group ~ gender_comp + age_recip + Var,
data=dta_tmp,trace=F, family = "binomial")
list(fit1, fit2, fit3, fit4)
})
plots_gender_age(pdf_name = "../plots/metabo_national/r&d/age_and_gender/tol_nontol_0.9.pdf",
norm_data = norm_data,
features = selected_metabolites_dr_qt[sel_common])
## quartz_off_screen
## 2
Now that we have selected features at the recipient level, at the donor level and at the couple level, we can try to combine these features:
We first combine the features that were selected as highly informative to classify between tolerant and non tolerant patients from the couples, recipients and donors into one big table:
We can generate a PCA and a tSNE using all these features to see how the tolerant and non tolerant patients are represented in the reduced dimensions:
The 2 first components of the PCA seem to hold quite a lot of the variability in the data:
The non tolerant patients seem to be situated slightly more on the left of the PCA, even though they are quite mixed with the tolerant patients, which suggests that it might be quite hard to classify the Cryostem patients with the features that we extracted.
We can also generate a tSNE on the same data:
We can see if building a model using all these features combined leads to good classification results between tolerant and non tolerant patients:
## predicted
## true non_tolerant tolerant
## non_tolerant 7 7
## tolerant 0 34
## [1] "prediction error: 14.5833333333333"
It still seems to be quite hard to classify the non toelrant patients as such. We can still see which features were selected in the model to classify tolerant from non tolerant patients in the Cryostem cohort:
## subst_X3.methyl.2.oxovalerate
## 1.4259318
## subst_X4.methyl.2.oxopentanoate
## 1.1249931
## R_glucuronate
## 0.6709134
## subst_X5alpha.pregnan.3beta.20alpha.diol.monosulfate..2.
## 0.5492983
## D_X1.arachidonoyl.GPI..20.4..
## 0.4812407
## R_lysine
## 0.4719778
## subst_X2.linoleoylglycerol..18.2.
## 0.4435989
## D_beta.alanine
## 0.4384470
## D_X1.stearoyl.GPI..18.0.
## 0.4253304
## subst_glycocholate
## 0.3906369
## R_androsterone.sulfate
## 0.3869785
## subst_cysteine.glutathione.disulfide
## 0.3674578
## R_ornithine
## 0.3553099
## R_leucine
## 0.3519632
## R_threonine
## 0.3431387
## R_N.palmitoyl.heptadecasphingosine..d17.1.16.0..
## 0.3215389
## subst_X2.aminobutyrate
## 0.3158164
## R_epiandrosterone.sulfate
## 0.3152807
## subst_glucuronate
## 0.3148524
## subst_X1.palmitoleoylglycerol..16.1..
## 0.3051172
And visualise the top variables in a heatmap:
Reminder: high values for the variables that start with “subst_” mean that the difference of expression between the donor and the recipient of the same couple was high for this specific feature.
We first combine the features that were selected as highly informative to classify between tolerant and non tolerant patients from the couples, recipients and donors into one big table:
We can generate a PCA and a tSNE using all these features to see how the tolerant and non tolerant patients are represented in the reduced dimensions:
The 2 first components of the PCA seem to hold quite a lot of the variability in the data:
The non tolerant patients seem to be situated slightly more on the right of the PCA, even though they are quite mixed with the tolerant patients, which suggests that it might be quite hard to classify the Cryostem patients with the features that we extracted.
We can also generate a tSNE on the same data:
We can see if building a model using all these features combined leads to good classification results between tolerant and non tolerant patients:
## predicted
## true non_tolerant tolerant
## non_tolerant 6 8
## tolerant 5 29
## [1] "prediction error: 27.0833333333333"
It still seems to be quite hard to classify the non toelrant patients as such. We can still see which features were selected in the model to classify tolerant from non tolerant patients in the Cryostem cohort:
## R_glucuronate
## 1.3425636
## D_beta.alanine
## 0.8654996
## R_lysine
## 0.8359781
## R_epiandrosterone.sulfate
## 0.7980659
## R_X1.palmitoylglycerol..16.0.
## 0.7041693
## R_ornithine
## 0.6658835
## R_leucine
## 0.6505494
## R_threonine
## 0.6164693
## R_isoleucine
## 0.5740168
## R_N.palmitoyl.heptadecasphingosine..d17.1.16.0..
## 0.5649969
## R_androsterone.sulfate
## 0.5251893
## subst_stearoyl.arachidonoyl.glycerol..18.0.20.4...1..
## 0.5123885
## R_cysteine.glutathione.disulfide
## 0.4497044
## R_methyl.4.hydroxybenzoate.sulfate
## 0.4380769
## R_androstenediol..3beta.17beta..monosulfate..1.
## 0.4128466
## R_beta.alanine
## 0.4093816
## D_X5alpha.pregnan.3beta.20alpha.diol.disulfate
## 0.3854638
## R_proline
## 0.3754609
## R_choline.phosphate
## 0.3739079
## R_trans.4.hydroxyproline
## 0.3639501
And visualise the top variables in a heatmap:
Reminder: high values for the variables that start with “subst_” mean that the difference of expression between the donor and the recipient of the same couple was high for this specific feature.
We first combine the features that were selected as highly informative to classify between primary and secondary tolerant patients from the couples, recipients and donors into one big table:
We can generate a PCA and a tSNE using all these features to see how the tolerant and non tolerant patients are represented in the reduced dimensions:
The 2 first components of the PCA seem to hold quite a lot of the variability in the data:
It seems like the primary and secondary patients are quite mixed in the PCA.
We can also generate a tSNE on the same data:
We can see if building a model using all these features combined leads to good classification results between primary and secondary tolerant patients:
## predicted
## true primary_tolerant secondary_tolerant
## primary_tolerant 11 7
## secondary_tolerant 7 9
## [1] "prediction error: 41.1764705882353"
We can see which features were selected in the model to classify primary and secondary tolerant patients in the St Louis cohort:
## subst_dimethylarginine..SDMA...ADMA.
## 1.6526586
## subst_N1.Methyl.4.pyridone.3.carboxamide
## 1.6011136
## R_X3.aminoisobutyrate
## 1.4950353
## subst_sphingosine.1.phosphate
## 1.2931693
## subst_X1.oleoyl.2.docosahexaenoyl.GPC..18.1.22.6..
## 1.2745226
## R_X5.acetylamino.6.amino.3.methyluracil
## 1.2328902
## subst_succinate
## 1.1786264
## subst_X2.palmitoyl.GPC..16.0..
## 1.1091733
## subst_X1.palmitoleoyl.2.linoleoyl.GPC..16.1.18.2..
## 1.0411937
## subst_sphingosine
## 1.0293748
## D_X3.aminoisobutyrate
## 1.0217826
## R_X2.palmitoyl.GPC..16.0..
## 0.9195629
## D_palmitoyl.dihydrosphingomyelin..d18.0.16.0..
## 0.9182698
## D_X3.hydroxypyridine.sulfate
## 0.6975682
We can visualise the top variables in a heatmap:
Reminder: high values for the variables that start with “couple_” mean that the difference of expression between the donor and the recipient of the same couple was high for this specific feature.
We first combine the features that were selected as highly informative to classify between primary and secondary tolerant patients from the couples, recipients and donors into one big table:
We can generate a PCA and a tSNE using all these features to see how the tolerant and non tolerant patients are represented in the reduced dimensions:
The 2 first components of the PCA seem to hold quite a lot of the variability in the data:
It seems like the primary and secondary patients are quite mixed in the PCA.
We can also generate a tSNE on the same data:
We can see if building a model using all these features combined leads to good classification results between primary and secondary tolerant patients:
## predicted
## true primary_tolerant secondary_tolerant
## primary_tolerant 13 5
## secondary_tolerant 8 8
## [1] "prediction error: 38.2352941176471"
We can see which features were selected in the model to classify primary and secondary tolerant patients in the St Louis cohort:
## R_X3.aminoisobutyrate
## 2.546789
## subst_X1.oleoyl.2.docosahexaenoyl.GPC..18.1.22.6..
## 2.201399
## subst_sphingosine.1.phosphate
## 2.152760
## subst_X2.palmitoyl.GPC..16.0..
## 2.139724
## subst_X1.palmitoleoyl.2.linoleoyl.GPC..16.1.18.2..
## 1.983336
## D_palmitoyl.dihydrosphingomyelin..d18.0.16.0..
## 1.907515
## subst_sphingosine
## 1.803902
## R_X2.palmitoyl.GPC..16.0..
## 1.729516
We can visualise the top variables in a heatmap:
Reminder: high values for the variables that start with “couple_” mean that the difference of expression between the donor and the recipient of the same couple was high for this specific feature.
Summary of the number of significative features found overall, between non tolerant and tolerant patients:| type | threshold | St Louis | Cryostem | both | behaviour |
|---|---|---|---|---|---|
| Recip_TNT | 0.95 | 158 | 59 | 21 | 20 |
| Recip_TNT | 0.9 | 191 | 93 | 44 | 42 |
| Donors_TNT | 0.95 | 76 | 23 | 6 | 4 |
| Donors_TNT | 0.9 | 112 | 55 | 12 | 6 |
| D-R_TNT | 0.95 | 90 | 88 | 17 | 1 |
| D-R_TNT | 0.9 | 120 | 136 | 32 | 3 |
| type | threshold | St Louis | Cryostem | both | behaviour |
|---|---|---|---|---|---|
| Recip_PS | 0.95 | 31 | 32 | 0 | 0 |
| Recip_PS | 0.9 | 50 | 61 | 3 | 2 |
| Donors_PS | 0.95 | 25 | 17 | 1 | 0 |
| Donors_PS | 0.9 | 45 | 38 | 3 | 1 |
| D-R_PS | 0.95 | 41 | 37 | 1 | 1 |
| D-R_PS | 0.9 | 77 | 63 | 8 | 5 |